Cory Whitney
Open RStudio
type ? in R console with function, package or data name
Add R to a search with a copy of an error message
Help > Cheatsheets > Data Visualization with ggplot2
Hadley Wickham
Load data
participants_data <- read.csv("participants_data.csv")
R has several systems for making graphs
table() and barplot() functionsparticipants_barplot <- table(participants_data$academic_parents)
barplot(participants_barplot)
Bar plot of number of observations of binary data related to academic parents
Many libraries and functions for graphs in R…
ggplot2 is one of the most elegant and most versatile.
ggplot implements the grammar of graphics to describe and build graphs.
Do more and do it faster by learning one system and applying it in many places.
Learn more about ggplot2 in “The Layered Grammar of Graphics”
Example from your data
library(ggplot2)
ggplot(data = participants_data,
aes(x = letters_in_first_name,
y = days_to_email_response)) +
geom_point()
Scatterplot of days to email response as a function of the letters in your first name
Want to understand how all the pieces fit together? See the R for Data Science book: http://r4ds.had.co.nz/
ggplot(data = participants_data,
aes(x = letters_in_first_name,
y = days_to_email_response,
color = academic_parents,
size = working_hours_per_day)) +
geom_point()
Scatterplot of letters in your first name as a function of days to email response with colors representing binary data related to academic parents and working hours per day as bubble sizes.
Make more graphs
Example from Anderson's iris data set
ggplot(data = iris,
aes(x = Sepal.Length,
y = Petal.Length,
color = Species,
size = Petal.Width))+
geom_point()
Scatterplot of iris petal length as a function of sepal length with colors representing iris species and petal width as bubble sizes.
Create a smaller diamonds data set (top 100 rows) for a scatterplot with carat on the x-axis and price on the y-xis and with the color of the diamond as the color of the points.
dsmall <- top_n(diamonds, n = 100)
ggplot(data = dsmall, aes(x = carat,
y = price,
color = color)) +
geom_point()
With “geom” different types of plots can be defined e.g. points, line, boxplot, path, smooth. These can also be combined.
# Create a smaller data set of diamonds with 50 rows.
dsmall <- top_n(diamonds,
n = 50)
# Create a scatterplot and smoothed conditional
# means overlay with carat on the x-axis
# and price on the y-axis.
ggplot(data = dsmall,
aes(x = carat,
y = price))+
geom_point()+
geom_smooth()
geom_smooth() selects a smoothing method based on the data. Use method = to specify your preferred smoothing method.
# Create a smaller data set of diamonds with 50 rows.
dsmall <- top_n(diamonds,
n = 50)
# Create a scatterplot and smoothed conditional
# means overlay with carat on the x-axis
# and price on the y-axis.
# Use 'glm' as the option for the smoothing
ggplot(data = dsmall,
aes(x = carat,
y = price))+
geom_point()+
geom_smooth(method = 'glm')
ggplot2 lines and smoothing options
geom_boxplot(). # Create a boxplot where the x-axis is cut and
# the y-axis is price divided by carat
ggplot(data = diamonds,
aes(x = cut,
y = price/carat)) +
geom_boxplot()
geom_jitter() show all points. # Create a jittered boxplot where the x-axis is cut and
# the y-axis is price divided by carat
ggplot(data = diamonds,
aes(x = color,
y = price/carat)) +
geom_boxplot()+
geom_jitter()
After you have gone through the tutorial please do the following exercises.